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Abstract. In graphical modelling, a bi-directed graph encodes marginal indepen- 
dences among random variables that are identified with the vertices of the graph. We 
show how to transform a bi-directed graph into a maximal ancestral graph that (i) 
represents the same independence structure as the original bi-directed graph, and (ii) 
minimizes the number of arrowheads among all ancestral graphs satisfying (i) . Here the 
number of arrowheads of an ancestral graph is the number of directed edges plus twice 
the number of bi-directed edges. In Gaussian models, this construction can be used for 
more efficient iterative maximization of the likelihood function and to determine when 
maximum likelihood estimates are equal to empirical counterparts. 



1. Introduction 

In graphical modelling, bi-directed graphs encode marginal in dependences amon g ran- 
dom variables that are identified with the vertices of the graph ( Kauermann ^ 19961: Pearl 
and Wermuth" ]l994l : IPichardsonl . 120031 ^. In particular, if two vertices are not joined by 
an edge, then the two associated random variables are assumed to be marginally in- 
dependent. For example, the graph G in Figure [H whose vertices are to be identified 
with a random vector (Xi, X2, X3, X4), represents the pairwise marginal independences 
X^ MX?. X^JLXa, and X2ALXA. While other authors (|Cox and Wermuthl . Il99fil . Il99.4 
Edwardsl . boool ^ have used dashed edges to represent marginal independences, the bi- 
direc t ed graphs we employ here make explicit the connection to path diagrams (IKosteii . 
1999l : IWrightlll934l ^. 

Gaussian graphical models for marginal independence, also known as covariance graph 
models, impose zero p a,tterns in the cov ariance matrix, which are linear hypotheses on 
the covariance matrix ( Anderson . 19731 ) . The graph in Figure [U for example, imposes 
cia = 0"i4 = (J9A = 0- An estimation pro cedure de signed for covariance g raph models is 
described in lDrton and RichardsonI ( 2003 ) ; see al so Chaudhuri et al. ( 20071). Other recent 
work involving these models includes Mao et al. ( 20041 ) aiid" wermuth et al. ( 20061 ) . 

In this paper we employ the connection between bi-directed graphs and the more 
general ancestral graphs with undirected, direct ed, and bi-directed edge s (Sect ion [21). For 
the statistical mot ivation of ancestral graphs seelRichardson and Spirted (120021 ) ; for causal 
interpretation see Richardson and Spirtes ( 20031 ). We show how to construct a maximal 
ancestral graph G™™, which we call a minimally oriented graph, that is Markov equivalent 
to a given bi-directed graph G and such that the number of arrowheads is minimal 
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Figure 1. A bi-directed graph G with (unique) minimally oriented graph C™™. 
(Sections [SHl]) . Two ancestral graphs are Markov equiv alent if the indep endence models 



associated with the two graphs coincide; see for example iRoveratd (|2005f ) for some recent 
results on Markov equivalence of different types of graphs. The number of arrowheads 
is the number of directed edges plus twice the number of bi-directed edges. Minimally 
oriented graphs provide useful nonparametric information about Markov equivalence of 
bi-directed, undirected and directed acyclic graphs. For example, the graph G in Figure 
[T] is not Markov equivalent to an undirected graph because G™™ is not an undirected 
graph, and G is not Markov equivalent to a DAG because contains a bi-directed 

edge. The graph in Figure [1] has a unique minimally oriented graph but in general, 
minimally oriented graphs are not unique. Our construction procedure (Algorithm [T^ 
involves a choice of a total order among the vertices. Varying the order one may obtain 
all minimally oriented graphs. 

For covariance graph models, minimally oriented graphs allow one to determine when 
the maximum likelihood estimate of a variance or covariance is available explicitly as 
its empirical counterpart (Section [S]). For example, since no arrowheads appear at the 
vertices 1 and 4 in the graph G™™ in Figure [H the maximum likelihood estimates of an 
and (T44 must be equal to the empirical variance of Xi and X4, respectively. The likelihood 
function for covariance graph models may be multi-modal, tho ugh simulations suggest 



this on ly occurs at small sample sizes, or under mis-specification (jPrton and Richardsonl . 



2004al ). However, when a minimally oriented graph reveals that a parameter estimate is 
equal to an empirical quantity (such as an and (T44 in the above example) then even if 
the likelihood function is multi-modal this parameter will take the same value at every 
mode. Perhaps most importantly, minimally oriented graphs allow for computationally 
more efficient maximum likelihood fitting; see Remark [23] and the example in Section 



2. Ancestral graphs and their global Markov property 

This paper deals with simple mixed graphs, which feature undirected {v — w), directed 
{v — > w) and bi-directed edges {v ^ w) under the constraint that there is at most one 
edge between two vertices. In this section we give a formal definition of these graphs and 
discuss their Markov interpretation. 

2.1. Simple mixed graphs. Let £ = {0, — , — >, <->} be the set of possible edges 
between an ordered pair of vertices; denoting that there is no edge. A simple mixed 
graph G = {V, E) is a pair of a finite vertex set V and an edge map E : V xV ^ £. The 
edge map E has to satisfy that for all w, w G F, 

(i) E{v,v) = 0, i.e., there is no edge between a vertex and itself, 

(ii) E{v, w) = E{w, v) if E{v, w) € {-, ^}, 

(iii) E{v,w)=^ <^=^ E{w,v)=-^. 
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In the sequel, we write v — ■wGG,v^w€G,v-<— wGGoiv<-^w€Gii E{v, w) 
equals — , —>■, <— or respectively. If E{v,w) ^ 0, then v and w are adjacent. If there 
is an edge v^wGG or v<^w£G then there is an arrowhead at v on this edge. If 
there is an edge v^w(^G or v — wGG then there is a tail at v on this edge. A vertex 
w is in the boundary of v, denoted by bd(f ), if v and w are adjacent. The boundary of 
vertex set ^ C 1/ is the set bd(^) = [U^,gAbd(f )] \ ^. We write Bd(i;) = bd(f) U {v] and 
Bd(74) = bd(j4) U A. An induced subgraph of G over a vertex set A is the mixed graph 
Ga = {A, Ea) where Ea is the restriction of the edge map E on A x A. The skeleton of 
a simple mixed graph is obtained by making all edges undirected. 

In a simple mixed graph a sequence of adjacent vertices . . . uniquely deter- 
mines the sequence of edges joining consecutive vertices Vi and fi+i, 1 < i < A; — 1. Hence, 
we can define a path vr between two vertices v and it; as a sequence of distinct vertices 
vr = (u, fi, . . . , Ufc, It;) such that each vertex in the sequence is adjacent to its predecessor 
and its successor. A path v if with all edges of the form and pointing toward 

w; is a directed path from v to w. If there is such a directed path from v to w ^ oi 
\i V = w, then v is an ancestor of w. We denote the set of all ancestors of a vertex v 
by An(t; ) and for a vertex set A C y we define An(^) = U.t,g^An(f ). Finally, a directed 
path from v to w together with an edge w ^ v £ G is called a directed cycle. 

Important subclasses of simple mixed graphs are illustrated in Figure [2j Bi-directed, 
undirected and directed graphs contain only one type of edge. Directed acyclic graphs 
(DAGs) are directed graphs wit hout directed cycles. These t hree types of graphs are 
special cases of ancestral graphs ( Richardson and Spirtes . 20021 ). 



Definition 1. A simple mixed graph G is an ancestral graph if it holds that 

(i) G does not contain any directed cycles; 

(ii) if V — w £ G, then there does not exist u such that u ^ v £ G or u *^ v £ G; 

(iii) ifv<^w£G, then v is not an ancestor of w. 

2.2. Global Markov property for ancestral graphs. Ancestral graphs can be given 
an independence interpretation, known as the global Markov p r opert y, by a graphical 



separation crite rion c alled m-separation (jRichardson and Spirted . |2002| . §3.4). An exten- 



sion of Pearl's ( 19881 ) d-separation for DAGs, m-separation uses the notion of colliders. 



A non-endpoint vertex Vi on a path is a collider on the path if the edges preceding and 
succeeding Vi on the path both have an arrowhead at Vi, that is, Vi ^ ^^i+i, 

Vi-i — > <-> Uj+i, Vi-i ^ Vi ^ fj+i or Vi^i ^ Vi ^ fj+i is part of the path. A 
non-endpoint vertex that is not a collider is a non-collider on the path. 

Definition 2. A path vr between vertices v and w in a simple mixed graph G is m- 
connecting given a possibly empty set C C V\{v.,w} if (i) every non-collider on vr is not 
in G , and (ii) every collider on vr is in An(C). // no path m-connects v and w given G, 
then V and w are m-separated given C . Two non-empty and disjoint sets A and B are 
m-separated given C C V\{AU B), if any two vertices v £ A and w £ B are m-separated 
given C . 

Let G = (y, E) be an ancestral graph whose vertices index a random vector {X^ \ 
V £ V). For A C y, let Xa be the subvector [X^ \ v £ A). The global Markov 
property for G states that Xa is conditionally independent of Xb given Xc whenever ^, 
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Figure 2. Simple mixed graphs, (i) A bi-directed graph, (ii) an undi- 
rected graph, (iii) a DAG, (iv) an ancestral graph. 



B and C are pairwise disjoint subsets such that A and B are m-separated given C in G. 
Subsequently, we denote such conditional independence using the shorthand AALB \ C 
that avoids making the probabilistic context explicit. The global Markov property, when 
applied to each of the graphs in Figure [2] in turn, implies (among other independences) 
that: 



(i) vALy and wALx; 

(ii) vALy \ {w, x} and wALx 
vALy \ {w,x} and wALx 



{v,y}; 



m 



(iv) vALy \ X and wALx \ v. 



If G is a bi-directed graph, then the global Markov property states the marginal 
independence vALw if v and w are not adjacent. In a multivariate normal distribution 
such pairwise marginal ind ependences hold if f all independences stated by the global 
Markov property for G hold ( Kauermann . 19961 ) . Without any distributional assumption, 
RichardsonI (|2003l . §4) shows that the independences stated by the global Markov property 
of a bi-directed graph hold iff certain (not only pairwise) marginal independences hold; 
see also iMatusI (|l994f ). 

The graphs in Figure [2] have the property that for every pair of non-adjacent vertices 
V and w there exists some subset C such that the global Markov property states that 
vALw I G. Ancestral graphs with this property are called maximal. If an ancestral graph 
G is not maximal, then there exists a unique Markov equivalent maximal ancestral graph 
G that contains a.11 the edges present in G . Mor eover, any edge in G that is not present in 
G is bi-directed ([Richardson and Spirted . |2002| . §3.7). Two ancestral graphs Gi and G2 
are Markov equivalent if they have the same vertex set and the global Markov property 
states the same independences for Gi as for G2. 



The following facts are easily established; see also iRichardson and Spirted (|2002l ). 



Lemma 3. (i) Markov equivalent maximal ancestral graphs have the same skeleton. 

(ii) // G is an ancestral graph that is Markov equivalent to a maximal ancestral graph 
G and has the same skeleton as G, then G is also a maximal ancestral graph. 

(iii) Bi-directed, undirected and directed acyclic graphs are maximal ancestral graphs. 

2.3. Boundary containment. In the subsequent Sections [3] and H] we will construct 
maximal ancestral graphs that are Markov equivalent to a given bi-directed graph. Via 
Theorem [5] below, the following property plays a crucial role in these constructions. 
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Definition 4. A simple mixed graph G has the boundary containment property if for 
all distinct vertices v,w G V the presence of an edge v — w implies that Bd(f ) = Bd(t(;) 
and the presence of an edge v ^ w in G implies that Bd(T;) C Bd(?i;). 

In the Appendix we present lemmas on the structure of m-connecting paths in graphs 
with the boundary containment property. These lemmas yield the following key result. 

Theorem 5. If G is an ancestral graph that has the same skeleton as a hi-directed graph 
G, then G and G are Markov equivalent iff G has the boundary containment property. 

Proof. Two vertices are adjacent in G iff they are adjacent in G. Therefore, G and G are 
Markov equivalent iff it holds that two non-adjacent vertices v and w are m-connected 
given C C y in G iff they are m-connected given G in G. 

{=^:) Suppose G does not have the boundary containment property, i.e., there exists 
an edge u — w G G or an edge v ^ w £ G such that Bd(t;) ^ Bd{w). Choose u G 
Bd{v) \ Bd{w). Since u and w are not adjacent, they are m-separated given C = in G. 
In G, however, the path (n, v, vu) m-connects u and w given G = 0. Hence, G and G are 
not Markov equivalent. 

(<^^:) First, let v and w be non-adjacent vertices that are m-connected given G C 1/ 
in G. By Lemma [29l there is a path vf = {v,vi, . . . ,Vk,w) that m-connects v and w 
given G in G and is such that vi, . . . ,vi^ are colliders with {vi . . . , u^,} C G. Since G is a 
bi-directed graph, the corresponding path vr = vi, . . . , w^, in G also m-connects v 
and vj given G. 

Conversely, let v and w be non-adjacent vertices that are m-connected given G C y 
in G. Let vr = {vq,vi, . . . ,Vk,Vk+i) m-connect v = vq and w = Vk+i given G in G 
such that no shorter path m-connects v and w given G. Then vi,. . . ,Vk are colliders, 
{vi...,Vk} ^ G, and Vi-i and Vi+i, i = l,...,k, are not adjacent in G. (This is 
a special case of Lemmas [23 and because a bi-directed graph trivially satisfies the 
boundary containment property.) It follows that, for all i = 1, . . . , A; — 1, Vi-i G Bd(f j) 
but Vi-i ^ Bd(t'i+i), and similarly Vi^2 ^ Bd(t>j) but Vi+2 G Bd(t'i+i). This implies 
that Bd{vi) 2 Bd(t'i+i) and Bd{vi) 2 Bd{vi+i) for alH = 1, . . . , — 1. Since G has the 
boundary containment property, it must hold that Vi Vi+i G G for all i = 1, . . . , A; — 1. 
Therefore, V2, . . . , v^-i are colliders on the path vf = (u, ui, . . . , Vk,w) in G. Similarly, it 
follows that V2 G Bd(vi) \ Bd(v), which entails Bd(vi) ^ Bd{v). Thus, vi is a collider on 
7f . Analogously, we can show that Vk is a collider on vf, which yields that vf is a path in 
G that m-connects v and w given G. □ 

3. SiMPLICIAL GRAPHS 

In this section we show how simplicial vertex sets of a bi-directed graph can be used 
to construct a Markov equivalent maximal ancestral graph by removing arrowheads from 
certain bi-dir ected edges. Simp li cial sets are also important in other contexts such as 



collapsibility ([Kauermannl . Il996l : iLauritzenl . 



1999: iMadiean and Mosurskil . Il990l . §2.1.3 



Jensenl . l200ll . §5.3). 



p. 121 and 219) and triangulation of graphs 

Definition 6. A vertex v £ V is simplicial, if Bd{v) is complete, i.e., every pair of 
vertices in Bd(u) are adjacent. Similarly, a set A QV is simplicial, z/Bd(^) is complete. 

Simplicial vertices can be characterized in terms of boundary containment as follows. 
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Proposition 7. A vertex v (zV is simplicial ijJJid{v) C Jid(w) for all w € Bd(t'). 

If an edge between v and w has an arrowhead at v, then we say that we drop the 
arrowhead at v when either v <— if is replaced hy v — w or v ^ w is replaced hy v ^ w. 

Definition 8. Let G be a bi-directed graph. The simphcial graph is the simple mixed 
graph obtained by dropping all the arrowheads at simplicial vertices of G. 

For the graph from Figure [H is equal to the depicted graph G™™; additional 
examples are given in Figure [3l Parts (i) and (ii) of the next lemma show that simplicial 
graphs have the boundary containment property. 

Lemma 9. Let v and w be adjacent vertices in a simplicial graph G^ . Then 

(i) ifv-w e G^ then Bd{v) = Bd(w); 

(ii) ifv^weG', then Bd{v) C BdH; 

(iii) ifv^weG', then each of Bd{v) = Bd{w), Bd{v) C Bd{w), and Bd{v) g 
Bd(?i;) 2 Bd(f ) might be the case. 

Proof, (i) and (ii) follow from Proposition [71 For (iii) see, respectively, the graphs Gf, 
G| in Figured and G' = G"'''" in Figured! □ 

Theorem 10. The simplicial graph G* of a bi-directed graph G is a maximal ancestral 
graph that is Markov equivalent to G. 

Proof. By Lemma El Theorem El and Lemma El it suffices to show that G* is an ancestral 
graph. This, however, follows from Lemma [TTI below. □ 

Lemma 11. Lf G is an ancestral graph that has the boundary containment property, then 
dropping all arrowheads at simplicial vertices of G yields an ancestral graph. 

Proof. Let G be the graph obtained by dropping the arrowheads at simplicial vertices. 
First, suppose v^wGGoiv->-^wGG but that there is a path vr from w to v that is a 
directed path in G. Since there are no arrowheads at simplicial vertices in G, no vertex 
on TT including the endpoints v and w can be simplicial. This implies that tt is a directed 
path from w to u in G. However, since v^w€G or v-^w(zG, this is a contradiction 
to G being ancestral. We conclude that G satisfies conditions (i) and (iii) of Definition [H 
Next, suppose v — w G G but that there exists another vertex u such that u ^ v (z G 
or u V (z G. It follows that v is not simplicial. Since G is ancestral, this implies that 
V ^ w (z G which in turn implies that Bd{v) C Bd{w) because G has the boundary 
containment property. The set Bd(u) is not complete because v is not simplicial. Thus 
Bd(u;) is not complete, i.e., vu is not a simplicial vertex. However, this is a contradiction 
to the fact that v ^ w (z G but v — w & G. Thus, G is indeed an ancestral graph. □ 

Proposition 12. A bi-directed graph G is Markov equivalent to an undirected graph iff 
the simplicial graph G* induced by G is an undirected graph iff G is a disjoint union of 
complete (bi-directed) graphs. 

Proof. If G* is an undirected graph, then by Theorem 1101 G is Markov equivalent to an 
undirected graph, namely G*. Conversely, assume that there exists an undirected graph 
U that is Markov equivalent to G. Necessarily, G and U have the same skeleton (recall 
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Figure 3. Bi-directed graphs with simphcial and minimany oriented graphs. 



Lemma [3|). By Theorem [5l U has the boundary containment property, which impUes 
that every vertex is simphcial and thus that is an undirected graph (and equal to U). 

The simphcial graph is an undirected graph iff the vertex set of the inducing bi- 
directed graph G can be partitioned into pairwise disjoint sets Ai, . . . ,Ag such that (a) 
if V £ Ai, 1 < i < q, and w £ Aj, 1 < j < q, are adjacer it, then i = 7, and (b) all the 
induced subgraphs Ga^, i = I, ■ ■ ■ ,q are complete graphs ( Kauermann . 19961 ). □ 



Under multivariate normality, a bi-directed graph that is Markov equivalent to an 
undirected graph represents a hypothesis that is linear in the covar iance matrix as well 
as in its inverse. The general structure of such models is studied in Jensen ( 19881 ). 



4. Minimally oriented graphs 

The simphcial graph G^ sometimes may be a DAG. For example, the graph u ^ 
V ^ w has the simphcial graph u ^ v ^ w. However, there exist bi-directed graphs 
that are Markov equivalent to a DAG and yet the simphcial graph contains bi-directed 
edges. For example, the graph Gi in Figure [3] is Markov equivalent to the DAG (7™™ 
in the same Figure. Hence, some arrowheads may be dropped from bi-directed edges 
in a simphcial graph while preserving Markov equivalence. In this section we construct 
maximal ancestral graphs from which no arrowheads may be dropped without destroying 
Markov equivalence. 

4.1. Definition and construction. The following definition introduces the key object 
of this section. 

Definition 13. Let G be a bi-directed graph. A minimally oriented graph of G is a graph 
G™™ that satisfies the following three properties: 

(i) G™™ is a maximal ancestral graph; 

(ii) G and G™"^ are Markov equivalent; 
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(iii) G™"^ has the minimum number of arrowheads of all maximal ancestral graphs 
that are Markov equivalent to G. Here the number of arrowheads of an ancestral 
graph G with d directed and b bi-directed edges is defined as arr(G) = d + 2b. 

By Lemma [3l a minimally oriented graph G™™ has the same skeleton as the underlying 
bi-directed graph G. According to Theorem [5l G™"^ has the boundary containment 
property. Examples of minimally oriented graphs are shown in Figure [3l Given the small 
number of vertices of these graphs the claim that these graphs are indeed minimally 
oriented graphs can be verified directly. The example of graph Gi in Figure [3] also 
illustrates that minimally oriented graphs are not unique. By symmetry, reversing the 
direction of the edge v ^ w in the depicted yields a second minimally oriented 

graph for Gi. 

We now turn to the problem of how to construct a minimally oriented graph. Define a 
relation on the vertex set V of the given bi-directed graph G by letting v =4b w i^ v = w 
or if Bd(t') C Bd(t(;) in G. The relation is a partial order and can thus be extended to 
a total order <onV such that the strict boundary containment Bd(t') C Bd(t<; ) implies 
that V < w. In general, the choice of such an extension to a total order is not unique. 

Algorithm 14. Let G be a bi-directed graph, and < a total order on V that extends the 
partial order =4 B obtained from strict boundary containment. Create a new graph G™™ 
as follows: 

(a) find the simplicial graph G" of G; 

(b) setG'f'^ = G'; 

(c) replace every bi-directed edge u <-> E G™™ with Bd(f ) C Bd(w) and v < w by 
the directed edge v ^ w. 

The notation G™™ indicates the dependence of this graph on both the bi-directed graph 
G and the total order <. Clearly, by Theorem [5l in order for G™™ to be a minimally 
oriented graph it is necessary that it satisfies the boundary containment property. The 
next lemma shows that this is true. 

Lemma 15. Let G be a bi-directed graph and G™™ the graph constructed in Algorithmic 
It then holds that 

(i) if V — w is an undirected edge in G™™, then Bd(u) = Jid{w); 

(ii) if V —> w is a directed edge in G™™, then Bd(z;) C Bd(it;); 

(iii) V ^ w is a bi-directed edge in G™™ iff'Qd{v) ^ Bd(it;) ^ Bd(v). 

Proof, (i) follows directly from Lemma WiS) because it follows from Algorithm [T3] that 
G™™ and G^ contain the same undirected edges. 

(ii) If the edge f — > w is already present in G^ , then Bd(w) C Bd(t(;) according to 
Lemma E^ii). If u — > is not already present in G*, then v < w and Bd(t;) C Jid{w). 

(iii) Suppose v and w are two adjacent vertices such that Bd{v) % Bd(tt;) ^ Bd(t>). 
Then f <-> i(; in G* and this edge cannot be replaced by a directed edge in step (c) 
of Algorithm [TH For the reversed claim, consider two adjacent vertices v and w such 
that Bd(f) C Bd(?i;). (The other case is symmetric.) \i v < then according to the 
definition of the simplicial graph and step (c) of Algorithm 1 141 the edge between v and w 
in G™™ cannot have an arrowhead at v and thus cannot be bi-directed. \i v > then 
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(i) 
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(ii) 
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Figure 4. Induced subgraphs for which no arrowhead can be dropped 
from edge v ^ w. 



Bd(?;) = Bd(tt;) because Bd(f ) C Bd('u;) would imply v < w. It follows that the edge 
between v and w in G™™ cannot be bi-directed as otherwise the arrowhead at w would 
be removed in step (c). □ 

By Lemma [TSlf iii) . w w G (^mm -g- |^]^gj.g exist vertices x G bd(u) \ {w} and y G 
bd(u)) \ {v} such that the induced subgraph Gi^x,y,v,w} equals one of the two graphs 
shown in Figure HI Graphs that do not contain the four-cycle from Figure [l||ii) as an 
induced subgraph ar e known as chord al or decomposable and play an important role in 
graphical modelling ( Lauritzen . 19961 ). Graphs not containing the path from Figure (H^i) 
as an induced subgraph are called cographs and have favorable computational properties 
terandstadt et alill999h. For instance, cographs can be recognized in linear time f Corneil 



et 



al.. ll98,^ . 



Theorem 16. The graph G™™ constructed in Algorithm^ 
for the hi- directed graph G. 



I is a minimally oriented graph 



Proof. We verify the conditions (i) and (iii) of Definition [131 This is sufficient because 
G™™ has the boundary containment property (LemmallSp and thus condition (i) implies 
condition (ii) by Theorem [5j 



(i) is a maximal ancestral graph: 

By Lemma[3]it suffices to show that G™™ is an ancestral graph. Let v and w be adjacent 
vertices such that v — wG G™°. This is equivalent io v — w G G*, and it follows that there 
does not exist an arrowhead at v or w; compare the proof of Theorem [TOl Furthermore, 
G™™ does not contain any directed cycles because Algorithm 1 141 ensures that the presence 
of a directed edge v ^ w G G™™ implies v < w in the total order. Finally, assume that 
there exists v w (z G™™. Then there cannot be a directed path from v to w, since by 
Lemma fTSTii) this would imply Bd(f) C Jid{w), contradicting Lemma [TSlf iii) . 

(iii) G™™ has the minimal number of arrowheads: 

Let G be a maximal ancestral graph that is Markov equivalent to the (bi-directed) graph 
G, which requires that G and G, and thus also G™™ have the same skeleton. Assume 
that arr(G) < arr(G™™). Then either (a) there exists u — > G G™™ such that v — w^G 
or (b) there exists v ^ w G G™™ such that v^w^G or v — wgG. 

Case (a): If f ^ if G G™™, then w cannot be simplicial. Hence, there exist two 
vertices G hd{w) that are not adjacent in G™™, and thus not adjacent in G; {v = x 
is possible). The global Markov property of G states that xALy. Since G is an ancestral 
graph and v — w (z G, however, there may not be any arrowheads at w on the edges 
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between x and w, and y and w in G. Therefore, x and y are m-connected given in G, 
which yields that the global Markov property of G does not imply xALy; a contradiction. 
Case (b): Suppose v w & G™™ but there is no arrowhead at v on the edge between 

V and w in G. By Lemma fTSTiii) there exists x E hd{v) \ Bd(tt;) such that x and w are 
not adjacent in G™™. Thus x and w are not adjacent in G and xALw is stated by the 
global Markov property for G. In G, however, f is a non-collider on the path (x, v, w) 
and thus this path m-connects x and w given 0, which yields that the global Markov 
property of G does not imply xALw; a contradiction. □ 

The next result shows that our construction of minimally oriented graphs is com- 
plete in the sense that every minimally oriented graph can be obtained as the output of 
Algorithm [T4l by appropriate choice of a total order on the vertex set. 

Theorem 17. is a minimally oriented graph for a bi-directed graph G, then there 

exists a total order < on the vertex set such that G*™™ = G^"\ 

Proof. The graph G™™ is an ancestral graph and thus contains no directed cycles. Hence, 
the directed edges in G™™ yield a partial order on the vertex set V in which v w 
if u = w or if there is a directed path from v to w. Define the relation ^bz? by letting 

V ^BD w V w or V w. Clearly, v ^bd v, i.e., the relation is reflexive. We claim 
that the relation is in fact a partial order. 

By Theorem[5l G™™ has the boundary containment property such that Bd(f ) C Bd(u;) 
if V w. Consequently, \i v ^ w then v w implies w ^b v and v ^b w implies 
w j^D V. This implies that =4bd is anti-symmetric. In order to verify transitivity, it 
suffices to consider three distinct vertices satisfying v w ^b u ov v ^b w =4d u- In 
the former case Bd(f ) C Bd{w) C Bd(it), and in the latter case Bd(v) C Bd{w) C Bd(ti). 
In both cases Bd(f) C Bd(u) such that v ^b u, which implies the required conclusion 

V =4bd u. 

We can now choose a total order < onV that extends the partial order =4bd and thus 
extends both =4b and =4d- Let G™™ be the output of Algorithm 1141 when the bi-directed 
graph G and the chosen total order < are given as the input. We claim that G™™ = G™™. 

First note that if t> is a simplicial vertex of G, then there are no arrowheads at v in 
G™™. Otherwise, we could drop all arrowheads at simplicial vertices in G™™ to obtain 
an ancestral graph (Lemma [TT]) with fewer arrowheads. The new graph would have the 
boundary containment property and thus be Markov equivalent to G (by Theorem [5|). 
This would contradict the assumed minimality of G™™. 

The observation about simplicial vertices implies that an undirected edge in the sim- 
plicial graph G* is also an undirected edge in G™™. Conversely, if v — w E G™"^ then 
there may not be an arrowhead at v on any other edge, and likewise for w, because 
G™™ is ancestral. Since G™™ has the boundary containment property, it follows from 
Proposition \7\ that both v and w are simplicial vertices. This implies that v — w (z G^ 
and we conclude that G™'" and G* have the same undirected edges. By construction, 
the same holds for G™™ and G'^. Hence, G™™ and G™™ have the same undirected edges. 

Suppose t> — > G Qmm_ Then Bd{v) C Bd{w) because G™™ has the boundary 
containment property. Moreover, v < w because the total order < extends ^d- It 
follows that V ^ w £ G™°. In other words, every directed edge in G™™ is also in G™°. 
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This together with the fact that (7™™ and have the same skeleton and the same 

number of arrowheads, arr(G"^''^) = arr(Gi!?''^), imphes that G"'''^ = G™'"^. □ 

4.2. Markov equivalence results. The following corollary is an immediate conse- 
quence of Proposition [12] because a minimally oriented graph is an undirected 
graph iff is an undirected graph. 

Corollary 18. Let G*™™ he a minimally oriented graph for a bi-directed graph G. If G 
is Markov equivalent to an undirected graph U , then G™™ = U is the unique minimally 
oriented graph of G. 

A minimally oriented graph also reveals whether the original bi-directed graph is 
Markov equivalent to a DAG. 

Theorem 19. Let G™™ be a minimally oriented graph for a bi-directed graph G. Then 
G is Markov equivalent to a DAG iff G^^^ contains no bi-directed edges. 

Proof. Let G be a bi-directed graph such that G™™ contains no bi-directed edges. If yl C 
y is a simplicial set, then the induced subgraph (G™™)^^ is undirected and complete (this 
follows directly from Theorem [T7| and Algorithm [T^ . Let Ai, . . . , Ag be the inclusion- 
maximal simplicial sets of G. Let -D be a directed graph obtained by replacing each 
induced subgraph (G™™)^. , i = 1, . . . , g, by a complete DAG. Then D itself has to be 
acyclic, which can be seen as follows: First, since G™™ does not contain any directed 
cycles, a directed cycle vr in D must involve a vertex v G uf^-^Ai. Let v £ Aj. Since the 
induced subgraphs D^^, i = 1, . . . ,q, are all acyclic, vr must also involve a vertex not in 
Aj. Therefore, there exists an edge x ^ w on tt such that w £ Aj and x ^ Aj. Since the 
sets Ai are inclusion- maximal simplicial sets, no vertex in Ai, i ^ j, is adjacent to any 
vertex in Aj. Hence, x ^l^i^i, which implies that the edge x ^ w is also present in 
^min 'pj^jg jg ^ contradiction to w being a simplicial vertex. 

Two vertices are adjacent in G™™ iff they are adjacent in D. Moreover, D has the 
boundary containment property because G™° has this property, and if u ^ u in D then 
either u — > u in G™™ or n — -u in G™™. It thus follows from Theorem [S] that D is Markov 
equivalent to G™^"^ and G. 

Conversely, suppose that v w £ Qmm ^ contradiction, that G is Markov 

equivalent to a DAG D. Note that D must have the same skeleton as G (and G™™). By 
Lemma [TOT iii). there exist two different vertices x G bd(t;)\{t(;} and y G bd(t(;)\{t;} such 
that, by the Markov property of G, xlLw and u _LLy. Hence, v and w must be colliders on 
the paths (x, u, vS) and (f, y) in D, respectively. This is impossible in the DAG D. □ 

Theorem [19] can be shown to be equivalent to a Ma. rkov equivalence result stated 



without proof in Theorem 1 in jPearl and Wermuthl (|1994| ) . This latter theorem requires 



'no chordless four-chain', which must be read as excluding gra phs with induced subgraph s 
that are either of the graphs in Figure[l] Under this condition, Pearl and Wermuth (|l994l ) 



also state that a Markov equivalent DAG can be constructed from the (undirected) 
skeleton of G by introducing directed and bi-directed edges in an operation they term 
'sink orientation', and turning remaining undirected edges into directed ones. The sink 
orientation of the graph Gi in Figure [3] has the directed edges of Gf but an undirected 
edge V — w. Thus sink orientation need not yield an ancestral graph. The bi-directed 
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graphical models consi dered in Theorem [T9l also a ppear in the construction of generalized 



Wishart distributions (jLetac and Massaml . 120071 . Thm. 2.2). In that context the models 



are called homogeneous and characterized in terms of Hasse diagrams. 

As the next result reveals, bi-directed graphs that are Markov equivalent to DAGs 
exhibit a structure that corresponds to a multivariate regression model. The graphs can 
also be termed chordal cographs; compare the paragraph before Theorem [TBI 

Proposition 20. Let G™"^ be a minimally oriented graph for a connected bi-directed 
graph G. If G*™™ contains no bi-directed edges, then the set A of all simplicial vertices 
is non-empty, the induced subgraph is a disjoint union of complete undirected 

graphs, the induced subgraph (G™™)y\^ is a complete DAG, and an edge v ^ w joins 
any two vertices v £ A and w ^ A in G™™. 

Proof. For two adjacent vertices v and w in G™™, Lemma [TSTil-fii) implies that Bd(u) C 
Bd(t(;) or Bd(?i;) C Bd(t'). Hence, we can list the vertex set asV = {vi, . . . , Vp} such that 
Bd{vi) C Bd(fj) if and Vj are adjacent and i < j. It follows that vi (z A and thus A $. 
Let Ai, . . . , Aq be the inclusion-maximal simplicial sets of G. Then (G™™)^ equals the 
union of the disjoint complete undirected graphs (G™")yi^, . . . , (G™™)^^. Since G™™ is 
an ancestral graph, (G™™)y\yi is a DAG. 

We prove the remaining claims by induction on If |y\j4| = 0, then the 

connected graph G™™ is a complete undirected graph and there is nothing to show. Let 
1^ \ ^! ^ 1- It follows that Vp (z V \A. If the shortest path between some vertex Vi-^ and 
Vp in G is of the form ^ . . . ^ Wj^. <-> Vp, then ii < ■ ■ ■ < ik < p and Bd(z;i^) C • • • C 
Bd{vii_) C Jid{vp), which is easily shown by induction on k. However, since G Bd(?;i^) 
it must in fact hold that Vi^ and Vp are adjacent. Hence, there is an edge between every 
vertex v £ V \ {vp} and Vp, which for v G ^ is of the form v Vp because clearly 
Vp ^ A. The proof is finished by combining what we learned about Vp with the induction 
assumption applied to the induced subgraph Gw with W = {vi, . . . ,Wp_i}. Note that 
for v,w £ W, the inclusion Hdciv) C BdG(u') implies that BdG'vi/(f) — ^^Gwi'^)- Thus 
by Lemma [T5l and Theorem [16l (Gp^/)™™ does not contain any bi-directed edges. □ 

5. Maximum likelihood estimation in Gaussian models 

In this section we consider the Gaussian covariance models associated with bi-directed 
graphs and demonstrate that the graphical constructions from Sections [3] and H] can be 
employed for more efficient computation of maximum likelihood estimates. 

5.1. Covariance graphs and Gaussian ancestral graph models. Let G be a bi- 
directed graph, and 

(1) P(G) = {S G R^^^ I S = (cT„^) sym. pos. def., cr„^ = V(t;, w) : v ^ w ^ G] 

be the cone of symmetric positive definite matrices with zero pattern induced by G. The 
covariance graph model associated with G is the family of multivariate normal distribu- 
tions N(G) = (Arv'(0, S) I S € P(G)). It can be shown that every distribution in N(G) 
satisfies all conditi onal independence s stated by the global Markov property for the bi- 
directed graph G ( Kauermannl . llOOsl . Prop. 2.2). Conversely, if a distribution 7VV(0, S) 



satisfies the global Markov property for G, then S G P(G). 
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Let S G M^^^ be the empirical covariance matrix computed from an i.i.d. sample 
drawn from some unknown distribution 7VV(0, S) G N(G), i.e., the {v, w)-th entry in S is 
the dot-product of the vectors of observations for the v-th. and w-th variables divided by 
the sample size n. The log-likelihood function is,n ■ P(G') ^ M of N(G) can be written 
as 

(2) ^5,n(S) = log(27r) - ^ log IS| - ^tr {J^'^S) . 

If S is positive definite then the global maximum of ^s'^n over P(G) exists. The likelihood 
equations obtained by setting to zero the partial derivatives of is^n with respect to the 
non-restricted entries in T, take on the form 

(3) = {Y,^^ ST,^^)vw yv,w : v = w or u^u^gG; 

compare Anderson and Olkin ( 19851 . §2.1.1). A matrix S(S') G P(G) that solves ([3]) is a 



solution to the likelihood equations of N(G). Since subsequent theorems on the structure 
of the likelihood equations are obtained via Gaussian ancestral graph models, we briefly 
review the parametrization of these models. 

Let G be an ancestral graph and unc CI V the set of vertices v that are such that any 
edge with endpoint v has a tail at v. By Definition [l]||i), v—w € G implies v,w G uug, and 
V w € G implies that v,w ^ unc. Let A be a symmetric positive definite un^ x unc 
matrix such that A„^ ^ only if v = w or v — w G. Let be a symmetric positive 
definite {V \ unc) x (y \ un^) matrix such that only if f = w or u ^ w G G. 

Finally, let 5 be a F x y matrix such that B^^ ^ only if ?/; — > w G G. Define the 
symmetric positive definite matrix 

(4) S(A,i?,0) = (I-S)-i(Y 

where / is the identity matrix. 

Let N(G) be the Gaussian ancestral graph model associated with G, i.e., the family of 
all centered normal dist r ibutio ns that are globally Markov with respect to G. As shown in 
Richardson and Spirted ( 2002 . §8), the normal distribution 7VV(0, S) with S = 5](A, B, Q) 



defined in ^ is in N(G). Conversely, if G is maximal, then for any 7VV(0, S) G N(G) 
there exist unique A, ^ , B of the above type such that S = Y,{A,B,il.). (Note that 
Richardson and SpirtesI ( 2002 ) use B for what is here denoted by / — B.) 



Since a bi-directed graph G and a minimally oriented graph G™'" are Markov equiv- 
alent, the parametrization map for G"" ", {A,B,i}) i— > 'E{A, B,Q), has image equal to 



P(G). By iRichardson and Spirted (|2002l . Thm. 8.14, Lemma 8.22), we obtain the follow- 
ing Lemma. 

Lemma 21. Let G be a bi-directed graph. The covariance matrix Ti(A, B,Q) solves the 
likelihood equations o/N(G) iff {A, B,Q) solves the likelihood equations o/N(G™™). 

5.2. Empirical maximum likelihood estimates. Using the graphical results estab- 
lished earlier, we can show that over simplicial sets a solution to the likelihood equations 
([3]) agrees with its empirical counterpart in S. 
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Theorem 22. Let G he a bi-directed graph with associated covariance graph model N(G). 
If ^ "^^y is simplicial, S is a symmetric positive definite matrix, and Ti{S) € P(G) is a 
solution to the likelihood equations then T,{S)axA = SaxA- 

Proof. By Theorem \T0\ the covariance graph model N(G) and the Gaussian ances- 
tral graph model N(G*) based on the simplicial graph are equal. Let N(G*) be 
parametrized by the precision matrix A, the matrix of regression coefficients B and the 
covari ance matrix 17 as described in ^5.1[ In particular, it follows from Richardson and 



Spirtes (12002, Lemma 8.4) that if E = T.{A,B,n), then (A-^)axA = ^AxA- 

The inclusion-maximal simplicial sets Ai, . . . , Aq of G form a partition of uncs. The 
induced subgraphs G^^, i = 1, . . . , q, are complete undirected graphs. It follows that A is 
a block-diagonal matrix such that A^^ = if there does not exist an inclusion-maxima l 



simplicial set Ai such that v,w & Ai. Now the discussion in lRichardson and Spirtes (|200l 



§8.5) and Lemma [2T] imply that every solution to the likelihood equations for A, VL 
in the Gaussian ancestral graph model N(G*) satisfies that {K~^)AixA^ = Sa^xA^ for all 
i = 1, . . . ,q. Since A C Aj for some j, it holds that SaxA = (A^^)axA = SaxA- D 

Our graphical constructions also provide information on when maximum likelihood 
estimates of conditional parameters are equal to their empirical counterparts. The con- 
ditional parameters we consider are the regression coefficients and conditional variance 
for the conditional distribution of variable v given its parents pa(u) = {w&V\w—>v^ 
^minj ^ minimally oriented graph G™"^. If pa(u) = 0, then conditioning variable v on 
pa(u) is understood to yield the marginal distribution of v. 

Theorem 23. Let he a minimally oriented graph for a bi-directed graph G, S a 

symmetric positive definite matrix, and S(5) G P(G) a solution to the likelihood equations 
If V is a vertex such that there is no vertex w with f <-> w G (^mm^ then the regression 
coefficients for v given pa(f ) are 

(5) ^('S')tixpa(i') [^('S')pa(i')xpa(D)] 'S'i)xpa(i') ('S'pa(i;) xpa(f )) 5 

and that the conditional variance for v given pa(t') is 

(6) Yj{S)yv — Tj{S)yxpa.{v)\j^{S)ps,(v)xpa.{v)\ ^(5')pa{»;)x-!; = 

Svv ~ 'S'iixpa(f) ('S'pa(i')xpa(t))) 'S'pa(i;)XD- 



VV 1 



Proof. If pa(t>) = 0, then w is a simplicial vertex, and the claim reduces to S(S')ot = S., 
which follow s from Theorem 1221 Otherwi se, using the parametrization of N(G™°), it 



follows from [Richardson and Soirtesl (|200l Thm. 8.7) that if S = S(A,B,f]), then 

^tixpa(ti) [^pa{-u)xpa(t))] I^vxpa,{v) 

and 

'^vv ~ Etixpa(t)) [^pa(u) xpa(ti)] ^pa(ti)x-u ^vv 

If A, 13, Cl solve the likelihood equations for N(G™™), then -Bt,xpa(D) ^iid 0^,^, solve the 
likelihood equations of the model in whic h all parameters in A, B , exce pt for B^xpa.{v) 



and Otjt, are held fixed. It follows from iDrton and RichardsonI (|2004bl . §§5.1-2) that 



ByxpsL{v) r^tjt, are equal to the empirical expressions on the right hand side of ([5]) and 
([6]), respectively. Applying Lemma [21] yields the claim. □ 
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Gene 


expression dat 


a. Empirical correlation 


matrix 


(above 



diagonal) and maximum likelihood estimate (below diagonal). The ital- 
icized diagonal entries are ratios between are maximum likelihood and 
empirical variance estimates. 



Remark 24. Iterative Conditional Fitting is a spe cial purpose algori t hm fo r maximum 



20071: 



likelihood estimation in covarian ce graph models (IChaudhuri et al. 
Richardson, |2003| ). However, it does not exploit the results of Theorems 



Drton and 
22] and 



On the other hand , if one runs the ancestral gra ph extension of iterative conditional 
fitting described in iDrton and RichardsonI (j2004bl ) on a minimally oriented graph, then 
unnecessary computations are avoided by implicitly exploiting Theorems [22l andl231 This 
is illustrated in the example in Section | 



If a bi-directed graph G has a minimally oriented graph without bi-directed edges 
then G is Markov equivalent to a DAG (Theorem ll9p and the likelihood equations have a 
unique solution that is a rational function of the empirical covariance matrix S. However, 
this is no longer true if there is a bi-directed edge in G™"^. In this case, G contains one of 
the two graphs in Figured as a subgraph; compare Lemma [TSTiii). Solving the likelihood 
equations for the bi-directed four-chain in Figure Hl^i) is equivalent to computing the roots 
of a q uintic polynomial. There exis t data for whic h this quintic h as exactly three real 
roots ( Drton and Richardson . 2004a ). Galois theory ( Stewart . 19891 . Lemma 14.7) implies 
that for these data the quintic is unsolvable by radicals, i.e., the roots of the quintic and 
thus the solutions to the likelihood equations cannot be computed from the data in finitely 
many steps involvin g addition, subtraction, multiplication, division, or taking r-th roots. 
(jGeiger et all (|2006l ) obtain similar results in the context of undirected graphs.) Similarly, 
solving the likelihood equations of the bi-directed four-cycle in Figure Hl^ii) corresponds 
to solving a polynomial equation sy stem of degree 17. This can be verified in computer 
algebr a systems such as Singular (iGreuel et al. I, E)OS^; see also iDrton and SullivantI 
(|2007l . §5). It is natural to conjecture that there exist data for which this system is also 
unsolvable by radicals. 



5.3. Example: Gene expression measurements. The application of co variance graph 
models to gene expression data has been promoted in iButte et al.l (l200dl. For i l lustra - 
tion, we select data from microarray experiments with yeast strands ( Gasch et al. . 2000l ) . 
We focus on eight genes involved in galactose utilization. Expression measurements for 
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all eight genes are available in n = 134 experiments, for which the empirical correlation 
matrix is shown in the upper-diagonal part of Table [TJ 

For these data, the covariance graph model induced by the graph G in Figure [5ji) has 
a deviance of 8.87 over 8 degrees of freedom, which indicates a good model fit; the p- value 
computed using a chi-square distribution is 0.35. Figure[5l^ii) shows the unique minimally 
oriented graph G™™. The maximum likelihood estimate obtained by fitting the model 
to the correlation matrix is shown in the lower-diagonal part of Table [U note that this 
estimate is not a correlation matrix (not all the italicized diagonal entries are equal to 
one). As predicted by Theorem 1221 the submatrix over GALl, GAL7, and GALIO equals 
the respective submatrix in the empirical correlation matrix. The regression coefficients 
for the regression of GAL2 on all remaining variables are identical when computed from 
the maximum likelihood versus the empirical estimate (Theorem [23]) . 

The use of a minimally oriented graph G™™ leads to a considerable gain in compu- 
tational efficiency in the iterative calculation of the maximum likelihood estimate S. 
With the identity matrix as starting value, iterative conditional fitting (Remark I24|) on 
the original bi-directed graph G performs eight multiple regressions per iteration and 
converges after 103 iterations. Using the same starting value and termination criterion, 
iterative conditional fitting on G™™ converges after only 5 iterations and requires only 
five multiple regressions per iteration (for the genes GAL2, GAL3, GAL4, GALll, and 
GAL80), of which the one for GAL2 has to be executed only in the first iteration. 

As in any application of covariance graph models, one might question the assump- 
tion of Gaussianity. Indeed there are 10 experiments in which the measurements for the 
genes GALl, GAL7, GALIO and GAL80 come out to be large negative values, and one 
in which GAL7 alone takes such a value. These appear to be outliers (standardized val- 
ues between -3 and -5), possibly produced by thresholding, as some values are identical. 
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However, the measurements for the other genes are well within the range of the observa- 
tions for the remaining 123 experiments. Thus it is unclear whether removing these 11 
experiments from consideration is appropriate. If the 11 experiments are removed, then 
the correlations among GALl, GAL7 and GALIO decrease to values between 0.38 and 
0.60, the latter value is the maximum of all correlations. Nevertheless the deviance for G 
increases only slightly to 10.09 (p- value 0.26). The iterative conditional fitting algorithm 
based on G now converges after only 20 iterations rather than 103. However, this is still 
four times as many iterations as required in iterative conditional fitting based on the 
minimally oriented graph G*™"^; recall that in addition each iteration is also simpler. 

The original correlation matrix in Table [1] exhibits an apparent similarity of the rows 
for GALl, GAL7 and GALIO; this is also reflected in the graph G in which these variables 
form a complete set and have the sai ne spouses. Such symrnetry c ould be investigated 
further via a group symmetry model ( Andersson and Madsen . 19981 ). 



6. Conclusion 

We showed how to remove a maximal number of arrowheads from the edges of a bi- 
directed graph G such that one obtains a maximal ancestral graph G*™™ that is Markov 
equivalent to G. The graph G™™, called a minimally oriented graph, reveals whether G 
is Markov equivalent to an undirected graph, and also whether G is Markov equivalent 
to a DAG. 

For the (Gaussian) covariance graph model associated with G, a minimally oriented 
graph yields an alternative parametrization that provides insight into likelihood 

inference. The structure of the arrowheads in G™° allowed us to identify parts of the 
covariance matrix for which the maximum likelihood estimates are equal to their empirical 
counterparts (this applies to all solutions to the likelihood equations if, as occasionally 
happens, there is more than one solution). This makes it possible to avoid or speed 
up iterative estimation of the full covariance matrix. We also saw that the maximum 
likelihood estimator of the covariance matrix in a covariance graph model is a rational 
function of empirical covariance matrix iff G™™ contains no bi-directed edge. This is 
similar to the results that identify d ecomposable mo dels as the sub-class of all log-linear 



and all covariance selection models (jPempsteij . Il972l ) for which the maximum likelihood 
estimator is available in closed form. 



Drton and Richardson ( 20081 ) formulate binary models based on the Markov property 



of bi-directed graphs. For these models, the maximum likelihood estimator is available 
in closed form if the model-inducing graph is Markov equivalent to a DAG. Moreover, 
we verified that in the example of the graph G in Figure [H the maximum likelihood 
estimates of the marginal distributions of Xi and X/^ are equal to the corresponding 
empirical proportions. We thus believe that analogs to the Gaussian results established 
here will hold in discrete models, but a general parametrization of discrete ancestral graph 
models is required to fully access the potential of the results obtained in this paper. 

Appendix A. Connecting paths and boundary containment 



In this appendix we prove results about graphs that satisfy the boundary containment 
property from Definition HI These results are used in the proof of Theorem [5l 
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Let V and w be two fixed distinct vertices that are m-connected given C C V\{v, w} 
in a simple mixed graph G. Define 'n.G{v^w\C) to be the set of paths that m-connect v 
and w given C in G, and let Ii™™{v, w\C) be the set of paths that are of minimal length 
among the paths in lici'^-, w\C). 

Lemma 25. If a simple mixed graph G satisfies the boundary containment property, 
Vi-i, Vi and fj+i are three consecutive vertices on a path vr in G, and Vi is a non-collider 
on vr, then Vi-i and Vj+i are adjacent. 

Proof. If Vi is a non-collider, then the edge between Vi and Vi-i or the edge between Vi 
and Vi^i must have a tail at Vi. Suppose, without loss of generality, that the latter is the 
case. Then Bd(uj) C Bd(fj4.i) and thus Vi-i £ Bd(uj+i), which is the claim. □ 

Lemma 26. Let G be a simple mixed graph, and vr = {v,vi, . . . ,Vk,w) € 11(5(1;, ii;|C). 
Let vq = V and f/t+i = w. If Vi is a non-endpoint vertex on vr and there is an arrow- 
head at Vi on the edge between Vi^i and Vi, then either (i) Vi G An(C) or (ii) the path 
(vi, fi+i, . . . ,Vk,w) is a directed path from Vi to w. 

Proof. Suppose the result is false. Let Vj be the vertex closest to w satisfying the an- 
tecedent of the Lemma, but not the conclusion. If Vj is a collider, then by definition of 
m-connection, Vj € An(C), which is a contradiction. If is a non-collider then Vj Vj+i 
on TT. If Vj+i = w, if Vj+i G An(C), or if (uj+i, . . . ,Vk,w) is a directed path from Vj^i to 
w, then clearly Vj satisfies the conclusion of the Lemma, which is a contradiction. But 
if Vj^i ^ An(C) U {w} and (I'j+i, . . . ,Vk,w) is not a directed path from Vj+i to w then 
Vj^i satisfies the conditions on vj, but is closer to w, again a contradiction. □ 

Lemma 27. If G is an ancestral graph that satisfies the boundary containment property 
and vr = {v,vi, . . . ,Vk,w) € IIq^'^{v,w\C) then no non- consecutive vertices on vr are 
adjacent. 

Proof. Let vq = v and Vk+i = w, and suppose for a contradiction that there are non- 
consecutive vertices on the path vr which are adjacent. Let {vp,Vg) be a pair of adjacent 
vertices which are furthest apart on the path, i.e., {p,q) maximizes the distance \r — s\ 
among pairs of indices of adjacent vertices Vr and Vs on the path. Since vr is of minimal 
length, v Vp OT w Vg. 

Suppose that v 7^ Vp. By definition of {p,q), fp-i is not adjacent to Vg. Consequently, 
by Lemma [25} Vp is a collider on {vp-i,Vp,Vg), and thus the edge between Vp-i and 
Vp has an arrowhead at Vp. It then follows by Lemma [26] that either Vp G An(C) or 
{vp,Vp+i, . . . ,Vk,w) is a directed path from Vp to w. In the latter case Vp G An{vg), 
but there is an arrowhead at Vp on the edge between Vp and Vg, which contradicts that 
G is ancestral. Hence Vp G An(C). U Vg = vu then the path {v,vi, . . . ,Vp,Vg = w) is 
m-connecting given C and shorter than vr. Hence Vg ^ w. It then follows by the same 
argument that Vg is a collider on {vp, Vg, fg+i) and in An(C). However, this also leads to a 
contradiction since then the path {v,vi, . . . , Vp, Vg, fg+i, . . . ,Vk,w) is both m-connecting 
given C and shorter than vr. 

The case where w ^ Vg may be argued symmetrically. □ 

Corollary 28. IfG is an ancestral graph that satisfies the boundary containment property 
and IT = (v = V(),vi, . . . , Vk,Vk+i = w) IIq^'^{v,w\G), then all the non-endpoint vertices 
vi, . . . ,Vk are colliders on vr. 
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Proof. This follows directly from Lemma [27] and Lemma [25l □ 

Even though all non-endpoints on a path of the type described in Corollary [28] in 
IIq^^(v,w\C) are colliders, not all non-endpoints must be in the set C. For example, in 
the graph G™™ from Figure [S] the path (x, v, y) m-connects x and y given {w} since 
the collider v is an ancestor of w. However, as the next Lemma shows, there will always 
exist a path in ng™(f , such that all non-endpoints are colliders in C. In G™™ from 
Figure [31 the path {x,w,y) m-connects x and y given {w}. 

Lemma 29. If G is an ancestral graph that satisfies the boundary containment property, 
and IT = {v = vo,vi, . . . ,Vk,Vk+i = w) & Uq^'^ {v , w\C) is such that no other path in 
IIq^^{v,w\C) has more non-endpoint vertices in C than ir, then all non-endpoint vertices 
vi, . . . ,Vk on TT are colliders that are in C . 

Proof. By Corollary 1281 all non-endpoints vi,...,Vk are colliders. Assume that there 
exists Vi ^ C , 1 < i < k. Since vr G IIg{v,w\C), and thus Vi G An(C), there exists c G C 
such that c G G. In particular, c ^ Vi^i and c / u j+i because Vi is ancestral 

neither to Vi-i nor to Uj+i. The boundary containment property and the fact that G 
does not contain directed cycles imply that vi —> c ^ G. By Lemma [23 G contains edges 
between c and both Vi-i and Vi+i. Since the edge between Vi-i and Vi has an arrowhead 
at Vi and Vi ^ c G, the edge between Vi-i and c must have an arrowhead at c because 
otherwise the fact that G is an ancestral graph would be contradicted. Similarly, the edge 
between Uj+i and c must have an arrowhead at c. If c, then Vi_2 is adjacent to c 

and by the same argument as above there must be an arrowhead at c on the edge between 
Vi-2 and c. Repeating this argument yields that there exists a vertex vi, £ < i — 1, such 
that either vg c G, or vg = v and v ^ c. The same arguments also imply that 
there exists a vertex Vj, j > i + 1, such that either Vj c ^ G, or Vj = w and w ^ c. 
Therefore, the path (u, fi, . . . , vg, c,Vj, . . . , Vk,w) is in IIg{v, w\G) and is either shorter 
than vr or of equal length but with more non-endpoint vertices in C. This contradicts the 
choice of vr and therefore the assumption of a non-endpoint on vr that is not in C must 
be false. □ 
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